Email Processing Orchestration
This document describes the email processing orchestration workflow that prioritizes placement offers over general notices, classifies incoming emails, extracts structured content using LLMs, and integrates with official placement data sources. It explains the decision trees for categorization and processing priorities, the LLM-powered extraction pipeline for placement services, and the official placement service integration. It also covers concurrent processing strategies and how the system maintains processing order during email volume spikes.
The email processing orchestration spans several modules:
CLI orchestration and scheduling: main entry points and schedulers
Email processing services: placement and general notice classification/extraction
Formatting and notification: transforming events into notices and broadcasting
Official data integration: scraping and storing government portal data
Configuration and dependency injection: centralized settings and DI
Diagram sources
Section sources
Priority-based orchestration: placement offers are processed first; only non-placement emails are routed to general notice classification.
LLM-powered classification and extraction: placement uses keyword-based classification plus LLM extraction; general notices use LLM-based classification and extraction.
Policy handling: placement policy updates are detected and processed separately from regular notices.
Official placement integration: periodic scraping of government portal data for batch-wise placement statistics.
Notification formatting and dispatch: placement events are transformed into notices and broadcast via configured channels.
Section sources
The orchestration follows a deterministic priority flow:
Fetch unread email IDs from Google Groups.
For each email:
Attempt placement offer detection using placement_service.
If not a placement offer, attempt general notice classification using email_notice_service.
Persist results and mark email as read.
Periodically scrape official placement data and integrate into the system.
Broadcast unsent notices via Telegram and/or Web Push.
Diagram sources
Priority-Based Email Handling#
Placement offers are processed first using a hybrid approach: keyword-based initial scoring followed by LLM validation to ensure only final placement offers are accepted.
Non-placement emails are routed to the general notice pipeline, which relies purely on LLM classification to distinguish valid notices from spam or irrelevant content.
Policy updates are detected and processed separately to maintain clean separation of concerns.
Generate events"] SaveOffer --> FormatNotice["PlacementNotificationFormatter.process_events()"] FormatNotice --> SaveNotice["Save placement notice"] CheckPlacement --> |Not Placement| CheckGeneral["EmailNoticeService.process_single_email()"] CheckGeneral --> |Valid Notice| SaveGeneral["Save general notice"] CheckGeneral --> |Policy Update| HandlePolicy["PlacementPolicyService.process_policy_email()"] CheckGeneral --> |Irrelevant| Skip["Skip and mark read"] SaveNotice --> MarkRead["Mark as read"] SaveGeneral --> MarkRead HandlePolicy --> MarkRead Skip --> MarkRead
Diagram sources
Section sources
Email Classification and Extraction Pipelines#
Placement Service Pipeline#
Classification: keyword-based scoring with confidence thresholds, plus LLM validation for final placement offers.
Extraction: LLM-based structured extraction with retry logic and validation.
Privacy sanitization: removal of headers and forwarded markers.
Output: PlacementOffer objects persisted to database and transformed into notices.
Diagram sources
Section sources
General Notice Service Pipeline#
Classification: LLM-based classification without keyword filtering; placement offers are explicitly excluded.
Extraction: LLM-based extraction with structured schema validation and retry logic.
Policy detection: separate LLM pass for placement policy updates with dedicated schema.
Output: NoticeDocument objects saved to database and formatted for notifications.
Diagram sources
Section sources
LLM-Powered Content Extraction for Placement Services#
Structured extraction prompts enforce strict schema compliance for company, roles, packages, students, and supporting details.
Robust retry logic with validation error accumulation and maximum retry limits.
Privacy sanitization removes headers, forwarded markers, and sensitive metadata.
Package conversion and normalization ensure consistent units (e.g., LPA) and handling of ranges and stipends.
Diagram sources
Section sources
Official Placement Service Integration#
Scrapes official JIIT placement page for batch-wise statistics and pointers.
Extracts placement pointers and package distribution tables.
Normalizes and stores structured data for downstream analytics and notifications.
Diagram sources
Section sources
Decision Trees and Fallback Mechanisms#
Placement classification decision tree:
Keyword-based confidence score threshold determines initial relevance.
LLM final validation ensures only final placement offers are accepted.
On rejection, the system proceeds to general notice classification.
General notice classification decision tree:
LLM-based classification excludes placement offers and spam.
Policy updates are detected and processed via a separate extraction pass.
On validation failure, retry up to configured limit; otherwise reject.
Fallback mechanisms:
Policy updates are handled independently and saved as policy documents.
Non-relevant emails are marked as read to prevent reprocessing.
Unsolicited notices are saved and broadcast via notification service.
Diagram sources
Section sources
Concurrent Processing Strategies and Volume Handling#
Sequential processing per email within a single orchestration loop to maintain ordering and avoid race conditions.
Separate schedulers for automated updates and bot commands to isolate workloads.
Retry logic with bounded attempts to handle transient LLM or parsing failures.
Daemon mode support for long-running processes with controlled logging.
Notification batching via NotificationService to minimize channel overhead.
Diagram sources
Section sources
Centralized configuration via Settings and safe_print utilities enable consistent logging and daemon behavior across services.
Dependency injection is used extensively to enable testability and modular composition.
Email clients and database clients are instantiated per-process or per-runner to avoid cross-service coupling.
Diagram sources
Section sources
LLM calls are rate-limited by sequential processing; consider batching and caching for high-volume scenarios.
Retry logic prevents unnecessary reprocessing of unread emails; ensure exponential backoff if extending retries.
Database writes are optimized by upsert operations and pre-filtering of existing IDs.
Official placement scraping is scheduled at off-peak hours to minimize impact on primary processing.
Placement offers not detected:
Verify keyword indicators and LLM prompts are aligned with actual email content.
Check validation errors and retry counts in the placement pipeline.
General notices not appearing:
Confirm LLM classification prompt excludes placement offers and spam.
Review policy update detection and fallback handling.
Notification delivery issues:
Inspect NotificationService results and channel-specific errors.
Ensure database contains unsent notices and proper channel configuration.
Section sources
The email processing orchestration implements a robust, priority-driven workflow that separates placement offers from general notices, leverages LLMs for accurate classification and extraction, and integrates official placement data for comprehensive coverage. The system’s decision trees, retry mechanisms, and concurrent processing strategies ensure reliability and scalability, while dependency injection and centralized configuration support maintain modularity and testability.